法国专利FR3025344A1 NETWORK OF CONVOLUTIONAL NEURONS

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A convolutional neural network (100) comprising a plurality of artificial neurons arranged in one or more layers, each convolutional layer comprising one or more output matrices (14) comprising a set of output neurons, each output matrix being connected to a plurality of neurons. an input matrix, comprising a set of input neurons, by artificial synapses associated with a convolution matrix comprising the synaptic weight coefficients corresponding to the output neurons of said output matrix, the output value of each output neuron being determined from input neurons of said input matrix to which the output neuron is connected and synaptic weight coefficients from the convolution matrix associated with said output matrix, characterized in that each synapse is comprised of a set of memristive device comprising at least one memristive device, each set of di memristif spositive storing a coefficient of said convolution matrix, and in that, in response to a change of state of an input neuron of an input matrix, the neural network is able to: dynamically interconnect each set of memory devices storing the weight matrix coefficients to the output neurons connected to said input neuron, and - for each output neuron, accumulating the weight coefficient values stored in said sets of dynamically interconnected memestive devices to said neuron output to an output accumulator (140), which provides the output value of said output neuron.
公开号:FR3025344A1
申请号:FR1458088
申请日:2014-08-28
公开日:2016-03-04
发明作者:Olivier Bichler
申请人:Commissariat a lEnergie Atomique CEA；Commissariat a lEnergie Atomique et aux Energies Alternatives CEA；
IPC主号:

专利说明:

[0001] TECHNICAL FIELD The invention relates generally to artificial neural networks and in particular to the implementation of convolutional neural networks in the form of electronic circuits from memristive devices. PRIOR ART Artificial neural networks are schematically inspired by biological neural networks whose functioning they imitate. Artificial neural networks consist essentially of neurons interconnected by synapses, which are conventionally implemented by digital memories, but which can also be implemented by resistive components whose conductance varies as a function of the voltage applied to their terminals. Artificial neural networks are used in different fields of signal processing (visual, sound, or other) as for example in the field of classification or image recognition. Convolutional neural networks correspond to a particular model of articular neural network. The convolutional neural networks have been described for the first time in K. Fukushima's article, "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36 (4): 193-202, 1980. ISSN 0340-1200. doi: 10.1007 / BF00344251 ". Numerous developments relating to neural networks have been proposed, as for example in the articles: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11): 2278-2324, 1998. ISSN 0018-9219. doi: 10.1109 / 5.726791. And P. Simard, D. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks applied to visual document analysis. Document Analysis and 3025344 2 Recognition, 2003. Proceedings. Seventh International Conference on, pages 958-963, 2003. doi: 10.1109 / ICDAR.2003. 1227801 ". The convolutional neural networks (designated in the English language by the terms "convolutional neural networks", or "deep (convolutional) neural 5 networks" or "ConvNets") are networks of neurons without feedback ("feedforward"). ), inspired by biological visual systems. Applied to image recognition, these networks allow the learning of intermediate representations of objects in images that are smaller and generalizable for similar objects, which facilitates their recognition. Such a network may consist of several convolutional layers, including or not pooling layers in the English language and which are generally followed by a multilayer perceptron classifier, the output of a layer being connected. on the entrance of the next. In a convolutional layer, each neuron is connected to a sub-matrix of the input matrix. Sub-matrices have the same size. They are staggered from each other in a regular way and can overlap. The input matrix may be of any size. However, the input matrix is generally of 2D dimension when the data to be processed is visual data, the two dimensions then corresponding to the spatial dimensions X and Y of an image. The neurons are connected to their input sub-matrix I by synapses whose weight is adjustable. The matrix K of the synaptic weights applied to the input sub-matrices of the neurons is the same for all the neurons of the same output map ("feature map" in Anglo-Saxon). Such a matrix K is also called a "convolution kernel". The fact that the convolution core is shared for the set of neurons of the same output card O, and thus applied to the whole of the input matrix, reduces the memory required for the storage of the coefficients, which optimizes performance. For example, for image recognition, this makes it possible to minimize the number of filters or intermediate representations that best code the characteristics of the image and that can be reused throughout the image. The coefficients of a K convolution kernel (i.e., synaptic weights) may correspond to conventional signal processing filters (Gaussian, Gabor, Laplace ...), or determined by learning, supervised or unsupervised, for example using the gradient retro-propagation algorithm used in multi-layered perceptron neuron networks. The coefficients of the convolution nuclei can be positive or negative, and are generally normalized between -1 and 1, as are the input and output values of the neurons. A neuron first realizes the weighted sum h of the coefficients of its input sub-matrix by the convolution core (that is, the scalar product between the input sub-matrix I and the matrix K), by applying the aggregation function g = <I, W>. The output of the neuron corresponds to the value of the activation function g of the neuron applied to this sum: g (h). Classically, g can take the form of a sigmoid function, typically the hyperbolic tangent function.
[0002] A convolutional layer may contain one or more convolution kernels, each of which has an input matrix, which may be the same, but which have different coefficients corresponding to different filters. Each convolution kernel in one layer produces a different output map so that the output neurons are different for each kernel.
[0003] Convolutional networks may also include local or global "pooling" layers that combine the neuron group outputs of one or more output cards. The combination of the outputs may for example consist of taking the maximum or average value of the outputs of the neuron group, for the corresponding output, on the output map of the "pooling" layer.
[0004] The "pooling" layers make it possible to reduce the size of the output cards from one layer to another in the network, while improving its performance by making it more tolerant to small deformations or translations in the input data. . Convolutional networks may also include fully connected layers of the perceptron type.
[0005] FIG. 1 represents an example of a simple convolutional network, with an input layer "env" corresponding to the input matrix, two convolution layers, "conv1" and "conv2", as well as two completely connected layers " fc1 "and" fc2 ". In this example, the size of the convolution cores 5 is 5 × 5 pixels and they are offset by 2 pixels (an offset or "stride" of 2): - "conv1" has an input matrix "env" and 6 different convolution cores producing 6 output cards; Conv2 has 12 different convolution cores and therefore 12 output cards, and each output card takes as input all 6 output cards from the previous layer. There are solutions consisting in implementing GPU GPU neural networks to significantly improve their performance, such as the solution described in the article by C. Ciresan, U. Meier, J. Masci, L. M.
[0006] Gambardella, and J. Schmidhuber, "Flexible, high performance convolutional neural networks for image classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Two, "IJCAI '11, pages 1237-1242, 2011. ISBN 978-1-57735-514-4. doi: 10.5591 / 978-1-57735-516-8 / IJCA111-210.
[0007] An important characteristic of neural networks is their scalability. Indeed, the neural networks remain effective regardless of the size of the image base to be learned as described in: - QV Le, R. Monga, M. Devin, K. Chen, Corrado GS, J. Dean, and AY Ng. Building high-level features using large scale unsupervised learning, In International Conference on Machine Learning, 2012; A. Krizhevsky, 1. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, 1097-1105. Curran Associates, Inc., 2012. In their classical version, the elements constituting the input matrix and the output cards are numbers, integers or decimals, with fixed or floating point. Thus, the convolution operation between an input sub-matrix and an output neuron corresponds to the scalar product between the input sub-matrix I and the matrix 3025344 5 K. In a so-called "pulse" version, the values input and output are encoded with pulses. A value can thus be encoded by the number of pulses during a fixed time window (frequency coding), or by the instant of emission of a pulse according to a coding technique of 5 rank order. In the case of frequency coding, the calculation of the weighted sum h is done by accumulating the coefficient of the convolution core at each arrival of a pulse on the corresponding input. The activation function of the neuron g can in this case be replaced by a threshold. When the absolute value of h exceeds the threshold following the arrival of a pulse on the input sub-matrix, the output neuron emits a pulse of the sign of h and returns h to the value 0. The neuron enters then in a period called "refractory" during which it can not emit any pulse during a fixed period. The pulses can therefore be positive or negative, depending on the sign of h when the threshold is exceeded. A negative input pulse reverses the sign of the corresponding kernel coefficient for accumulation. An equivalence between classical version and impulse version is shown in J. Perez-Carrasco, B. Zhao, C. Serrano, B. Acha, T. Serrano-Gotarredona, S. Chen, and B. LinaresBarranco. Mapping from frame-driven to frame-free event-driven vision systems by 20 low-rate rate coding and coincidence processing-application to feedforward convnets. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35 (11): 2706-2719, 2013. ISSN 0162-8828. doi: 10.1109 / TPAMI.2013.71. In particular, a hardware implementation of a convolutional impulse network has been proposed in L. Camunas-Mesa, C. Zamarreno-Ramos, A.
[0008] Linares-Barranco, A. Acosta-Jimenez, T. Serrano-Gotarredona, and B. Linares-Barranco. An event-driven multi-kernel processor convolution module for eventdriven vision sensors. Solid-State Circuits, IEEE Journal of 47 (2): 504-517, 2012. ISSN 0018-9200. doi: 10.1109 / JSSC.2011.2167409. Such a convolutional implementation uses a separate digital memory to store the coefficients of the convolution kernels and requires copying these kernel coefficients from the memory to the calculating unit (ALU) at each pulse arrival.
[0009] Thus, the existing solutions all require a calculation unit for the convolution calculation. Moreover, such solutions are limited by the size of the data bus of the digital memories and require at first a recovery of the value stored in the memory before being able to perform an operation on it. Thus, for example, if the size of the memory bus is 64 bits and the coefficients are stored on 8 bits, it is possible to recover at each clock stroke only 8 coefficients. Millions of clock cycles may therefore be required depending on the number of coefficients in the network. This results in a consequent processing time, as well as a significant energy consumption, in order to calculate all the outputs of the network for an input data item. Such a situation constitutes a fundamental bottleneck of this type of architecture (also called Von Neumann bottleneck or "Von Neumann bottleneck" in Anglo-Saxon language). Such a bottleneck can at most be limited (but not eliminated) by the use of SIMD-type instructions (acronym for "Single Instruction on Multiple Data" meaning literally "single instruction on multiple data") for the processing of several data into a processor instruction and the use of distributed memory with several computing units as in the GPU graphics processors. zo General definition of the invention The invention improves the situation. For this purpose, it proposes a convolutional neural network comprising a plurality of artificial neurons arranged in one or more layers. Each convolutional layer comprising one or more output matrices comprising a set of output neurons.
[0010] Each output matrix is connected to an input matrix, which comprises a set of input neurons, by artificial synapses associated with a convolution matrix comprising the synaptic weight coefficients corresponding to the output neurons of said output matrix. . The output value of each output neuron is determined from input neurons 30 of the input matrix to which the output neuron is connected and synaptic weight coefficients from the convolution matrix associated with the output matrix. Each synapse is made up of a set of memristive devices comprising at least one memristive device, each set of memristive device storing a coefficient of said convolution matrix. In response to a state change of an input neuron of an input matrix, the neural network is able to: dynamically interconnect each set of memristive devices storing the coefficients of the weight matrix to the neurons connected to the input neuron, and - for each output neuron, accumulate the values of the weight coefficients stored in the sets of dynamically interconnected memestive devices to the output neuron in an output accumulator, which provides the value of output neuron output. According to one embodiment, the neurons may use time coding, the dynamic interconnection being implemented in response to triggering an input neuron of the input matrix. In response to the dynamic interconnection, the accumulation of the values of the weight coefficients can then be implemented by propagation of at least one pulse coding the value of each weight coefficient, according to the time coding, the values accumulated in the output accumulators constituting the output values of the output matrix. The neural network may comprise a set of switches and a logic circuit for mapping synaptic weight coefficients with the output neurons connected to the input neuron having undergone a change of state, from the address of the neuron network. input neuron in the input matrix, to achieve dynamic interconnection. Each pulse may comprise a bit stream coding the address (X, Y) of the destination of the pulse along two perpendicular axes X and Y, the reference (X, Y) corresponding to the reference of the input matrix; when said pulse arrives at the input matrix, the coded address (X, Y) represents the location of the input neuron to be activated.
[0011] According to one embodiment, the dynamic interconnection can be carried out in parallel and in a single cycle, by simultaneously connecting all the weight coefficients stored in the memristive devices to the output neurons connected to the input neuron that has undergone a change of state.
[0012] According to another embodiment, the dynamic interconnect can be performed semi-sequentially by connecting the coefficients of the weight matrix stored in the memristive devices, one row of the matrix after the other, to the output neurons connected to the neuron. input having undergone an Io change of state. Alternatively, the dynamic interconnect can be performed semi-sequentially, by connecting the weight matrix coefficients stored in the memristive devices, one column of the matrix after the other, to the output neurons connected to the input neuron. having undergone a change of state. In yet another variant, the dynamic interconnection can be carried out sequentially, by connecting the convolution matrix coefficients, stored in said memristive devices, one after the other, to the output neurons connected to the input neuron. having undergone a change of state. According to one characteristic, the neural network may comprise an accumulator arranged at the output of each synapse, the accumulator realizing the accumulation of the value of the weight coefficient stored in the memristive devices of the synapse with the value stored in the accumulator of the corresponding output neuron, the value stored in the auxiliary accumulator being then propagated by the pulses in the accumulator of the output matrix. According to another characteristic, the neural network may comprise an accumulator arranged at the level of each output neuron, the output of each memristive device of the synapse being propagated at the level of the output neuron, the value thus propagated being accumulated with the value stored in the accumulator.
[0013] The neuron outputs can be grouped, whereas the accumulated values corresponding to grouped outputs are stored in a common accumulator. According to yet another characteristic, the neural network may comprise STDP-type on-line learning from the dynamic interconnection. The proposed embodiments thus make it possible to calculate the convolution directly in the memory and in parallel, which improves the speed and the energy efficiency of the operation. BRIEF DESCRIPTION OF THE DRAWINGS Other features and advantages of the invention will become apparent from the following description and the figures of the accompanying drawings in which: FIG. 1 represents an example of a simple three-layer convolutional network, as well as the output layer; Fig. 2 shows an example of a complex convolutional network, including "pooling" layers; Fig. 3 is a diagram showing a convolution layer consisting of several output cards / matrices; FIG. 4 illustrates the principle of operation of a convolutional layer in such a network; FIG. 5 is a diagram illustrating the pulse coding and propagation in the neural network of the pulses; Fig. 6 shows a hardware device implementing a convolutional pulse neural network, according to some embodiments; Fig. 7 is a flowchart showing the convolution method, according to some embodiments; FIG. 8 is a schematic representation of another example of hardware implementation of convolution operations in a neuron network with grouping of the outputs (downsampling at the output); Figure 9 is a schematic representation of another example of hardware implementation of convolution operations in a neural network with STDP learning; Fig. 10 is a flowchart illustrating the step of dynamic interconnection, according to a parallel embodiment (in one cycle); Fig. 11 shows an exemplary hardware embodiment of the neural network corresponding to a parallel embodiment (single cycle) according to the embodiment of Fig. 10; Fig. 12 is a flowchart illustrating the step of dynamic interconnection, according to a semi-sequential embodiment (in n cycles). FIG. 13 represents an example of hardware realization of the neural network in a semi-sequential mode in Y mode; Fig. 14 is a flowchart illustrating the dynamic interconnection step, according to a completely sequential embodiment; FIG. 15 represents an example of hardware realization of the neural network corresponding to a completely sequential embodiment; Fig. 16 is a flowchart illustrating the step of dynamic interconnection, according to a semi-sequential embodiment; FIG. 17 represents an example of hardware realization of the neural network in a semi-sequential mode in X; Fig. 18 is a flowchart showing the accumulation step, in one embodiment on the kernel side; Fig. 19 shows an exemplary hardware embodiment of the accumulator in the kernel-side embodiment in write-add mode; FIG. 20 represents the hardware embodiment of the accumulation part of FIG. 19 in back-write mode; Fig. 21 is a flowchart showing the accumulation step, according to an embodiment of the accumulation on the output side; and Fig. 22 shows an exemplary hardware embodiment of the accumulator in the output side accumulation embodiment. Detailed Description Figure 2 shows an example of a convolutional network including pooling layers for image classification. The images at bottom 10 of FIG. 2 represent an extract of the convolution kernels of the first layer, after a retro-propagation gradient learning, on an image basis such as ImageNet. An artificial neural network (also called a "formal" neural network or simply referred to as "neural network" hereafter) consists of one or more layers of neurons interconnected with one another. Each layer consists of a set of neurons, which are connected to one or more previous layers. Each neuron in a layer can be connected to one or more neurons of one or more previous layers. The last layer of the network is called the "output layer".
[0014] The neurons are connected to each other by synapses, or synaptic weights, which weight the efficiency of the connection between the neurons, constitute the adjustable parameters of a network and which store the information included in the network. Synaptic weights can be positive or negative. The so-called convolutional neural networks (or convolutional, deep convolutional, convnets) are furthermore composed of layers of particular types, such as convolutional layers, pooling layers. Anglo-Saxon language) and fully connected layers. By definition, a convolutional neural network comprises at least one convolution or "pooling" layer.
[0015] As illustrated in FIG. 3, a convolution or "pooling" layer may consist of one or more output matrices 14 (also called "output cards" or "output feature map" in the English language). , each output card being connectable to one or more input matrices 11 (still referred to as "input cards"). As illustrated in FIG. 4, an output matrix denoted by 0 comprises coefficients Oii, and has a size noted (Oh, Ow). This matrix corresponds to a matrix of neurons and the coefficients Oii correspond to the output values of these neurons, calculated from the entries and the synaptic weights. An input matrix or card 11 may correspond to an output card of a previous layer, or to an input matrix of the network that receives the stimuli or a portion of the stimuli to be processed. A network may consist of one or more input matrices. It can be for example RGB, HSV, YUV or any other conventional component of an image, with a component matrix. An input matrix denoted I comprises coefficients IQ, and has a noted size (/ h, / w). An output card 0 is connected to an input matrix I by a convolution operation, via a convolution core 12 denoted K (the convolution core is also called a filter, or convolution matrix), of size (n, m ) and including Km coefficients. Each neuron of the output card 14 is connected to a portion of the input matrix 11, this portion still being referred to as the "input sub-matrix" or "neuron receiver field" and being of the same size as the convolution matrix K. The convolution matrix K comprising the synaptic weights is common for all the neurons of the output card 0 (the weights of the matrix K are then called "shared weights"). Each output coefficient of the output matrix Oij then satisfies the following formula: / min (n-1,1h-i.si) min (m-1, / wj.sj) Oij = gKk, 1k = 0 1 In the above formula, g () denotes the activation function of the neuron, while si and s1 denote the vertical and horizontal offset parameters ("stride" in Anglo-Saxon language) respectively. Such a "stride" offset corresponds to the offset between each application of the convolution core on the input matrix. For example, if the offset is greater than or equal to the size of the kernel, then there is no overlap between each kernel application. An output card 0 is connected to an input matrix I by a "pooling" operation which downsamples the input matrix, which provides a subsampled matrix. The downsampling can be of two types: - A type of "MAX pooling" sub-sampling according to the equation below: m = g (max kmin0 (n-1, / hi.si ) max - in (m-1, / wj.si) To - A sample type called "AVERAGE pooling" according to the equation below: Oij = g 1 min (n-1, / hi .si) min (m-1, / wj.si) k = 0 1 = 0 n.
[0016] The synaptic weights associated with connections in the case of a pooling layer are generally unitary and therefore do not appear in the formulas above. A fully connected layer comprises a set of neurons, each neuron being connected to all inputs of the layer. Each neuron Oi has its own synaptic weights Wi j with the corresponding inputs Ii and performs the weighted sum of the input coefficients with the weights which is then passed to the neuron activation function to obtain the output of the neuron.
[0017] The neuron activation function g () is generally a sigmoid function, such as the tanh () function. For the "pooling" layers, the activation function can be for example the identity function. Synaptic weights are determined by learning. Learning a neural network consists in finding the optimum values of the synaptic weights from an optimization method and a learning base. There are a multitude of learning methods such as the method of retro-propagation of the gradient, the basic principle consisting, from a stimulus input network, calculating the output of the network, compare it to the expected output 10 (in the case of a so-called supervised learning) and retro-propagate an error signal in the network, which modifies the synaptic weights by a method of descent of the gradient. The neural networks can be transposed into pulse coding as illustrated in FIG. 5. In this case, the signals propagated at the input and at the output of the network layers are no longer digital values, but electrical pulses (comparable to pulses). Dirac). The information that was coded in the signal value (normalized between -1 and 1) is then coded temporally with the order of arrival of the pulses (rank order coding) or with the frequency of the pulses .
[0018] In the case of row coding, the instant of arrival of the pulse is inversely proportional to the absolute value of the signal to be coded. The sign of the pulse then determines the sign of the value of the signal to be encoded. In the case of a frequency coding, the frequency of the pulses, between fmin and fmax, is proportional to the absolute value of the signal to be encoded. The sign of the pulse determines the sign of the value of the signal to be encoded. For example, considering an input matrix of the network corresponding to the luminance component of an image, normalized between 0 and 1, a white pixel (or matrix coefficient), coded by a value 1, will emit pulses at a frequency fmax, a black pixel, coded by a value 0, will emit pulses at a frequency fmin, while a gray pixel, coded by a value x, will emit pulses at a frequency f = fmin + x (fmax - fmin). The coding may also be pseudo-frequential, for example fishy: in this case fmax and fmin represent average frequencies only. The initial phase of the pulses can be random. The pulses can also come directly from a sensor, such as a retina or an artificial cochlea, mimicking the operation of their biological equivalent. In the case of a pulse neuron, the weighted sum of the synaptic weights with the input signals is replaced by the integration of the pulses from the same weighted inputs by the synaptic weights.
[0019] All pulses are identical except for their sign, so that their integral can be considered unitary or normalized. Moreover, in the pulse approach, the activation function of the neuron is replaced by a threshold which is unitary in the case of synaptic weights normalized between -1 and 1. When the absolute value of the integration of neuron 15 exceeds threshold defined by the thresholding, the neuron emits an output pulse, which is propagated to all the neurons of the following layers connected to this neuron. The sign of this impulse corresponds to the sign of the integration of the neuron. When the neuron emits a pulse, its integration is reset and the neuron enters a so-called "refractory" period.
[0020] When a neuron is in a "refractory" period, it can not emit a new output pulse until the end of the refractory period, which may be equal to the minimum period of the pulses propagating in the network. The artificial synapses of a neural network can be realized from memristive devices. A memristive device is a two-terminal electronic component that behaves like a resistor or conductance, the value of which can be changed with the application of a current or voltage across its terminals. A device of this type can be characterized by the following equations: ## EQU1 ## In the equations above, G denotes the conductance of the device which connects its input current i to the voltage at its terminals y.
[0021] A memristive device may be binary and / or stochastic. A family of memristive devices particularly adapted to the realization of artificial synapses in a neural network is such that the characteristic f (v, G) is nonlinear, as for example the devices of the MIM (Metal-InsulatorMetal) type which constitute the cell. This is a base for several nonvolatile memory technologies such as Resistive Random Access Memory (RRAM), Conductive-Bridging RAM (CBRAM), and Oxide-based Resistive Random Access Memory (OxRAM). Thus, a synapse can be composed of one or more memristive devices. Numerous other technologies can also be considered as memories, such as phase change memory (PCRAM), floating gate transistors, memristors, organic memristors, the NOMFET transistor, and Organic Memory Field Effect Transistor. An artificial neural network can be achieved by using such memproducing devices as artificial synapses and integrating them into a "crossbar" type structure. Figure 6 shows a hardware device 100 implementing an output map of a convolutional layer of an impulse convolutional neural network, according to some embodiments.
[0022] According to one aspect of the invention, the device 100 is constituted of synapses 10, each synapse being implemented from one or more memristive devices, to perform the convolution operations of a convolutional neural network 11 without it it is necessary to use calculation units for such operations and copy operations of the convolution kernel. In the following description, reference will be made mainly to a single "memristive device" as an element of a synapse for illustration. However, the invention also applies to a synapse consisting of several parallel devices in parallel whose equivalent conductance corresponds to the sum of the conductances of the individual memristive devices. The matrix ("crossbar" in Anglo-Saxon language) in which are incorporated the devices memristifs 10 is designated by reference 5. In Figure 6, the coefficients of the core 12 are numbered from 1 to 9 and spread vertically. Although not limited to such applications, the device 100 may be used to realize a hardware implementation of convolutional neural networks for image classification and recognition. The remainder of the description will be made essentially with reference to input cards comprising pixel-type neurons for illustrative purposes. The convolution operations implemented by the device 100 make it possible to determine each coefficient Oij of the output matrix 14 for a given convolutional layer from the input matrix 11 (still noted I) of the convolution layer and convolution core 12 (still noted K). The convolution operation from which each output coefficient Oii is defined is given by equation 1 below: / min (n-1.1h-i.s1) min (n-1, / wj.sj ) Oii = g i.si + k, js j + 1. Kk, 1 (1) k = 0 1 = 0 In this equation, the coefficients lij represent the coefficients of the input matrix I of the convolutional layer considered, and the coefficients Kk, I represent the coefficients of the convolution core. 12 (weight of inter-neuronal connections). More particularly, the neural network device 100 uses a plurality of memory devices 10 to store the convolution core (s) 12 associated with an output matrix of the convolutional layer. Each device 30 has a resistance value which can be electrically switchable and can be of any type (such as for example a transistor). In one embodiment, the device 100 may use one or more memristive devices, constituting a synapse for each coefficient of the convolution kernel Km.
[0023] The pulses transiting between the layers of the neural network may be coded in AER ("Address-Event Representation") format. In this format, the pulse is digital and consists of a bit stream encoding the destination address (X, Y) of the pulse along two perpendicular axes X and Y, the reference (X, Y) corresponding to the reference of the input matrix, as well as the sign of the pulse. When the pulse arrives on the input matrix, the coded address (X, Y) represents the location of the input neuron to be activated. Thus, when a pulse arrives on the input matrix, its address (X, Y) gives the location to be activated with: X- * jetY- * i.
[0024] According to another characteristic, the device 100 can additionally apply a time coding operation of information for carrying out each convolution between the input matrix 1 of a given convolutional layer and the convolution core 12, the operation of weighting of the input coefficients I; j of the input matrix 1 of a given convolutional layer with the coefficients Kk, I 20 of the convolution core being then time multiplexed. According to one aspect of the invention, the weighting operation is carried out directly in the memristive devices 10 storing the coefficients corresponding to the weights of the convolution nucleus, by Ohm's law: I = G (conductance) * U, where G denotes the conductance of one or more memristive devices forming a synapse, and U denotes the voltage of a pulse which is fixed, the value of the input coefficient being coded temporally (for example by frequency coding or rank coding). The parameter I is the value to accumulate in the corresponding output neuron.
[0025] An input neuron 1 belongs to the receiver field of one or more output neurons 140.
[0026] In FIG. 6, a neuron of the input matrix is activated (pixel in the example under consideration) and is surrounded by a black frame. In the representation of the activation state of the matrix 11, the pixel 28 of the input matrix is activated. Assuming that the convolution core has a size of 3 × 3 and the "stride" offset is 1, the input pixel considered belongs to the receiver field of 9 output neurons (upper part of FIG. 6): activated pixel is connected to the receiver field of output neuron No. 8, with coefficient No. 9 of the convolution core (lower part of FIG. 6); the activated pixel is connected to the receiver field of the output neuron No. 9, with the coefficient No. 8 of the convolution core; the activated pixel is connected to the receiver field of the output neuron No. 10, with the coefficient No. 7 of the convolution core; - etc. According to another aspect of the invention, the coefficients j of the input matrix I can be implemented physically by a predefined frequency of the pulses while the Ku coefficients are stored in the form of an electrical conductance of a memristive device. 10. Thus, each memristive device 10 is configured to receive a pre-synaptic pulse emitted by a neuron of the input matrix 1 to which it is connected upstream. As used herein, the term "pre-synaptic pulse" refers to a pulse emitted by a neuron in the direction of a synapse to which it is connected downstream. A pre-synaptic pulse thus emitted from an input neuron 11 propagates towards the synapse, materialized by the memristive device 10, to which the input neuron is connected. An input pulse propagates from the input neuron to the output neurons that have the input neuron in their receiver field, as shown in Fig. 6. The pulse arriving on each output neuron is weighted by the synaptic weight corresponding to the coefficient of the associated convolution kernel for this input. This weighting can be done electrically by modulating the voltage of the pulse with the equivalent conductance of the synapse, consisting of one or more memristive devices, thanks to Ohm's law. A pulse modulated in this manner arriving at an output neuron is then integrated into the neuron (analogically or numerically), which emits an output pulse if this integration exceeds a threshold value. According to a characteristic of the invention, the device 100 comprises an interconnection matrix 13 configured to dynamically associate (so-called "dynamic mapping" operation in the English language) each output 121 of the convolution core corresponding to a coefficient of the kernel of convolution, with a position 140 of the accumulation matrix 14, upon activation of an input neuron. The dynamic mapping operation maps the kernel coefficients to the set of output neurons involving the activated input for the convolution calculation as described by the previous equations. For example, as shown in FIG. 6, when the input neuron 1 is triggered, the horizontal line intersecting the vertical line connected to the element 9 in the matrix 12 representing the convolution core 12 is dynamically connected at the from its end 121, to the position 8 of the accumulator 14. Thus, for an input Ii j of the activated input matrix (activated for example with an AER pulse from a previous layer or a typical sensor retina or artificial cochlea with address (X, Y)), the coefficients Km of the convolution core are connected to the outputs 0 ik j-1 as follows: If 'Sj Kk, 1 Oi-k j-1 if' sj For k = i mod si; k <+ 1); k = k + if 1 = j mod if; 1 <min (m, j + 1); 1 = 1 + if The dynamic interconnection operation (dynamic mapping) thus makes it possible to reverse the convolution equation: 3025344 21 / min (n-1.1h-i.si) min (n-1 The convolution operation is performed by accumulating dynamically connected convolution core coefficients to the respective accumulators 140 at each pulse arrival on the outputs 121. The pulse is propagated on the vertical line 50.
[0027] The dynamic mapping between the convolution core 12 and the outputs 140 is made possible by the pulse coding of the inputs applied to the memristive devices. When the pulse propagation is triggered according to the predefined pulse frequency, the pulses are propagated to the respective accumulators 140 to which they are connected by the interconnect matrix through memproting devices. Fig. 7 is a flowchart showing the convolution method according to some embodiments. In step 100, a neuron of the input matrix I, belonging to a given convolutional layer, is triggered by an event. An input neuron belongs to the receiver field of one or more output neurons (i.e. the input neuron is connected to these output neurons). The input neuron can be of any type as a pixel in the case of an impulse encoded input image. The triggering event may be, for example, derived from the previous convolutional layer or an event sensor, such as a retina, a cochlea, or even a frequency coding of static data, such as images or a spectrum. time-frequency. In response to triggering of the input neuron, in step 102, a dynamic match is implemented to map the convolutional core coefficients to the outputs for which the input pixel has a contribution.
[0028] In step 101, in response to triggering of the input neuron, the memproducing devices 10 storing the convolution core coefficients are dynamically connected to the corresponding output neurons. The dynamic mapping step 101 between the memory devices 10 and the output neurons can be implemented according to different types of dynamic routing methods such as: parallel routing; - A semi-sequential routing in y; - sequential routing; or 10 - Semi-sequential routing in x. Step 101 is reiterated for each new triggered input neuron if that input neuron is different from the previously triggered neuron. Dynamic routing methods can be analog or digital. Each type of routing method offers a different compromise in number of cycles and number of switches. Parallel routing, for example, propagates the input pulse on all output neurons for which it has a contribution at once, simultaneously, but requires a higher number of switches for dynamic routing. An accumulation step 103 may then be implemented from the propagation of the pulses. In one embodiment, the accumulation step 103 is implemented on the side of the convolution core, in two substeps (read-add then retro-write). Alternatively, it can be implemented on the output side 121 in a single add-write step. In the embodiment where the accumulation is implemented on the side of the convolution core in two substeps, for each output neuron 141 connected to a memristive device 10, the accumulation value is propagated in a first time from each output neuron to an intermediate accumulator (accumulator 22 in FIGS. 8 and 9) present in the output nodes 121 of the matrix 12 containing the coefficients of the convolution core 30 stored by the memristive devices 10. According to the type of routing, the pulses 2325344 23 can be propagated simultaneously or not. For example, in the case of parallel routing, they can be propagated simultaneously. At the same time, the input impulse is propagated at each synapse and is weighted by the value of the synapse. For each synapse, the pulse, once weighted, is added to the intermediate accumulator present in the output nodes 121 with the value of the accumulation previously stored in the accumulator 140 of the corresponding output neuron 141 (step of read-addition or "read-add" in the English language). This read-add step can be performed in parallel, depending on the type of routing chosen for dynamic interconnection 10 (dynamic mapping). In a second step, when the accumulation has been performed for all the output neurons 141 identified in the receiving field of the input neuron 110 which activated in step 100, the accumulated values in the intermediate accumulators can be propagated to the corresponding output neurons 141 and stored in the corresponding accumulators 140, in step 103 (back-write step or "write back" in English language). Step 103 can be performed in parallel, depending on the type of routing chosen for dynamic mapping. In the embodiment where the accumulation is implemented on the output neurons 141 side in a single add-write step, the accumulation can be performed directly in the output neurons. 141 and be performed in parallel, depending on the type of routing chosen for the dynamic interconnection. In another embodiment where the dynamic mapping operation is not implemented in parallel, the read-add step can be performed for a first synapse group, then The write-back step can be executed for this same first group, before moving on to the next group and changing the dynamic mapping. Such an embodiment eliminates the need to perform the same mapping twice each time ("mapping"), a first time for read-add and a second time for read-write. In addition, it makes it possible to reduce the number of intermediate accumulators, since from one synapse group to another, the same accumulators can be reused.
[0029] The neural network may be made by integrating the memristive devices 10 into a "crossbar" type structure, with or without a selection device (transistor, diode or other device with a non-linearity). Fig. 8 is a schematic representation of another example of hardware implementation of convolution operations in a neural network with bundling of outputs by downsampling. The operation of grouping the outputs by subsampling consists in gathering the neighboring outputs to form a new smaller output matrix 14.
[0030] In the embodiment of FIG. 8, the grouping of outputs is carried out directly at the level of the accumulation, by pooling the accumulators 140 corresponding to the same output in groups 142, in the matrix 14 scaled. The number of accumulators and the size of the interconnection matrix are thus reduced. According to one characteristic of the invention, the maximum size of the matrix of the memristive devices 10 may advantageously be equal to the size of the convolution core 12, for example of size 35x35, 11x11 or 5x5. The size may be limited in particular by the leakage currents. In embodiments using interconnect matrix 13, interconnect matrix 13 (analog) may be small in size and may be CMOS. Alternatively, it can be performed physically with memristive devices. The neural network device 100 may additionally operate with a learning method, such as a STDP (Spike Timing Dependent Plasticity) rule-based learning method in-situ, or online. The interconnect matrix performs address decoding to map synaptic weights to output neurons for a given enabled input.
[0031] In the example of FIG. 9, each synapse consists of 3 devices (each line implementing a coefficient of the convolution kernel). In this case, a pulse can be propagated simultaneously on all the columns to obtain at the end of each line a current pulse weighted by the equivalent conductance of the synapse corresponding to the sum of the conductances of the devices constituting the synapse. The pulses are propagated on the vertical lines 5 of the crossbar matrix 5. In such an embodiment (synapse consists of several memristive devices), the accumulation step 103 of FIG. 7 can be implemented. sequentially, by activating one device after the other, with an adder (1 + x) per line, or synapse, the value "1" being added to the value "x" accumulated by the corresponding output neuron 140. This is equivalent to having several columns of devices, as in Figure 9. Alternatively, the accumulation step can be implemented in one step, using multiple thresholds (as many as devices per synapse) and an adder by line. The thresholds may be increasing multiples of the base threshold so as to perform a digitization of the equivalent synaptic weight stored in the set of devices constituting a line. This embodiment is particularly suitable in the case of binary devices, being able to store only an "ON" (active) or "OFF" (inactive) state. The base threshold is set to trip in the event that at least one of the devices is in the "ON" state. For example, if the equivalent synapse consists of 4 devices, 2 of which are in the "ON" state, the first two thresholds will trigger on the 4, thus encoding the synapse value of 2/4. Such an embodiment can be implemented using digital accumulators 140 or, alternatively, analog accumulators, using an analog-to-digital converter (ADC) per line. FIG. 11 is a schematic representation of another example of hardware implementation of convolution operations in a neural network with STDP ("Spike-Timing-Dependent Plasticity" learning) meaning "plasticity according to the time of occurrence of the pulses" ).
[0032] The STDP learning can be done in one step with interaction of pre- and post-synaptic pulses. In the context of a STDP learning rule implemented by the neural network, pre-synaptic and post-synaptic pulses may for example be sent by the input and output neurons towards the neural network. a synapse (constituted by one or more memristive devices) to act on the variation of its conductance, for example as described in FR2977351 B1. In the embodiment of Figure 11, a write circuit 15 is used. Following activation of an output neuron, the coefficients of the associated convolutional core lo are mapped to the inputs constituting the output neuron receiver field (dynamic mapping). The coefficients of the nucleus are then modified in the following manner: If the time of last activation of the input tp '(pre-synaptic time) immediately precedes the activation time of the neuron tp't (post-synaptic time) and is in the temporal window of potentiation of the neuron (LTP window), then the weight of the synapse (kernel coefficient) is increased (LTP); - If the last activation time of the input is not in the LTP window, then the weight of the synapse is decreased (LTD).
[0033] In FIG. 11, t), represents the time of the last event on a node, that is, the pre-synaptic time (storage in memory). The term LTP (Long Term Potentiation) refers to the phases or states corresponding to an increase in the conductance of a synapse. The Spike Timing Dependent Plasticity STDP learning rule is to change the weight of a synapse according to the temporal distance between pre- and post-synaptic pulses. The LTP state corresponds to the state of potentiation of the synapse and the LTP phase has a given duration corresponding to the duration during which the conductance of the synapse increases. In the case of a conventional STDP learning rule, the variation of the synapse conductance may depend on the relative arrival times of the two pulses.
[0034] In a variant, the learning can also be stochastic. The dynamic "mapping" operation makes it possible to effectively implement this learning rule in the case of a convolutional impulse network. Figs. 12 to 17 show embodiments of step 5 of dynamically interconnecting memproducing devices 10 and output neurons in the receiving field of an activated neuron (step 102 of Fig. 7). Fig. 12 is a flowchart illustrating the step of dynamic interconnection, according to a parallel embodiment (in one cycle). In step 110, an input neuron is triggered (for example a pixel). Steps 112 and 114 correspond to the dynamic interconnection between the outputs 121 of the matrix 12 and the nodes 141 corresponding to the output neurons that are in the receiver field of the triggered neuron. In step 112, a decoding of the X address of the input neuron is performed.
[0035] The corresponding lines of the matrix 12 are then activated (each line corresponds to a different synapse or coefficient of the nucleus). In step 114, a decoding of the Y address of the input neuron is performed. The corresponding lines of the matrix 12 are then activated. In step 116, an accumulation of the values is carried out in the output matrix zo 14 by propagation of the pulses as previously described. FIG. 13 represents an example of a hardware embodiment of the neural network corresponding to a parallel embodiment (single cycle) according to the embodiment of FIG. 12. The accumulators 140 (right-hand part) are connected to the coefficients of the core 25 corresponding to after X and Y decoding. The number of NTG switches used in the hardware realization of the neural network is then given by the following equation: NTG = m. not. Oh, my. Ow. This embodiment makes it possible to perform convolutional computation in a single cycle, but also requires a larger number of switches. This mode is particularly suitable when the speed of processing of the system is important.
[0036] Fig. 12 is a flowchart illustrating the step of dynamic interconnection, according to a semi-sequential embodiment (in n cycles). In step 130, an input neuron is triggered (for example a pixel). The dynamic interconnection between the outputs 121 of the array 12 and the nodes 141 corresponding to the output neurons that are in the triggered neuron field of the triggered neuron comprises steps 132 to 135. More specifically, in step 132, a decoding of the X address of the input neuron is performed. The corresponding columns of the matrix 12 are then activated. In step 134, a decoding of the Y address of the input neuron is performed. The number of lines to be activated sequentially is defined by the parameter n, which corresponds to the number of lines in the convolution matrix K (12). In step 135, a line is activated (current line). If a line was previously activated, this previous line is deactivated beforehand. In step 136, an accumulation of the values is performed in the output matrix zo 14 by pulse propagation on the current activated line. The next line is then activated in step 138 if the number of lines already activated is less than n (i.e. the iteration number of steps 135/136 is less than n). FIG. 13 shows an exemplary hardware embodiment of the neural network corresponding to a semi-sequential embodiment in Y, according to the embodiment of FIG. 12.
[0037] The accumulators 140 (right-hand part) are connected to the corresponding core coefficients after decoding in X and Y. The matrix 12 (left-hand part) is made physically by a set of memristive devices 10. The accumulation in the accumulators 140 is performed for each line activated sequentially by a sequencer. The number of NTG switches used in the hardware realization of the neural network is then given by the following equation: NTG = M. n + m. Ow. This embodiment offers a compromise between the fully parallel mode and the fully sequential mode.
[0038] Fig. 14 is a flowchart illustrating the dynamic interconnection step, according to a completely sequential embodiment (in m.n cycles). In step 150, an input neuron is triggered (for example a pixel). The dynamic interconnection between the outputs 121 of the array 12 and the nodes 141 corresponding to the output neurons that are in the triggered neuron field of the triggered neuron comprises steps 152 to 156. More specifically, in step 152, a decoding of the Y address of the input neuron is performed. The number of lines to activate sequentially is defined by the parameter n. In step 153, a line is activated (current line). If a line has been zo previously activated, this previous line is deactivated beforehand. In step 154, a decoding of the X address of the input neuron is performed. The number of columns to be activated sequentially is defined by the parameter m, which corresponds to the number of columns in the convolution matrix K (12). In step 155, a column is activated (current column). If a column has been previously activated, this previous column is deactivated beforehand.
[0039] In step 156, an accumulation of the values is performed in the output matrix 14 by pulse propagation on the activated current rows and columns. The next column is then activated in step 157 if the number of lines already activated is less than m (i.e. iteration number of steps 153/156 is less than 5 n). The next line is then activated in step 158 if the number of lines already activated is less than n (i.e. the iteration number of steps 153/156 is less than n). It should be noted that the order of the symmetrical steps relating to the processing of the rows and columns (152-153 and 154-155; 157 and 158) can be reversed. FIG. 15 shows an example of hardware realization of the neural network corresponding to a completely sequential embodiment, according to the embodiment of FIG. 14. Each accumulator 140 (right-hand part) is sequentially connected to the corresponding core coefficient after X and Y decoding. The matrix 12 (left part) is made physically by a set of memristive devices 10. The accumulation in the accumulators 140 is performed for each row and column activated sequentially by a sequencer. The number of NTG switches used in the hardware realization of the neuron network is then given by the following equation: NTG = M. n + Ω. This embodiment minimizes the number of switches and therefore the complexity of the interconnection matrix, but allows only one operation per cycle and therefore has a priori limited interest compared to a conventional digital implementation. such a network where the memory accesses 25 are sequential. Fig. 16 is a flowchart illustrating the dynamic interconnection step, according to a semi-sequential embodiment (in m cycles).
[0040] In step 160, an input neuron is triggered (for example a pixel). The dynamic interconnection between the outputs 121 of the matrix 12 and the nodes 141 corresponding to the output neurons which are in the receiver field of the triggered neuron comprises the steps 162 to 165.
[0041] More specifically, in step 162, a decoding of the Y address of the input neuron is performed. The corresponding lines of the matrix 12 are then activated. In step 164, a decoding of the X address of the input neuron is performed. The number of columns to activate sequentially is defined by the parameter m.
[0042] In step 165, a column is activated (current line). If a column has been previously enabled, this previous column is disabled beforehand. In step 166, an accumulation of the values is performed in the output matrix 14 by pulse propagation on the current activated column. The next column is then activated if the number of lines already activated is less than m (i.e. the number of iterations of steps 165/166 is less than m). FIG. 17 represents an example of a hardware realization of the neural network corresponding to a semi-sequential embodiment in Y, according to the embodiment of FIG. 16. The accumulators 140 (right-hand part) are connected to the coefficients of the core 20 corresponding after decoding in X and Y. The matrix 12 (left part) is made physically by a set of memristive devices 10. The accumulation in the accumulator 140 is performed for each column activated sequentially by a sequencer. The number of NTG switches used in the hardware realization of the neuron network is then given by the following equation: NTG = M. n + n. Oh + Ow. As shown in FIGS. 13, 15, 17, the connections made by the dynamic mapping operation between the synapses and the accumulators of the output neurons cause the weighted pulses to pass through the value of the synapses. According to the coding of these weighted pulses, the connections can be made in different ways: by analog connections (single wire), passing analog pulses whose amplitude or duration codes the weighting; - By digital connections (1 single wire, 1 bit), passing in series (bit by bit) the value of the weighting, coded on a fixed or variable number of bits. A digital coding is then used, it can be binary or unary. - By digital connections (N son, N bits), passing in parallel the weighting value, encoded on N bits. A digital coding is then used, it can be binary or unary. Figures 18 to 22 show embodiments of the accumulation step 103 of Figure 7. More specifically, Figure 18 is a flowchart showing the accumulation step, according to a kernel-side embodiment. In step 180, an input neuron is triggered (for example pixel). In step 182, the dynamic interconnect step is performed, for example according to zo one of the embodiments of Figures 12 to 17. In step 183, the value stored in the accumulator of each neuron of output 140, being in the receiving field of the triggered neuron, is added with the synaptic weight corresponding to the input neuron (stored in at least one memristive device). The value thus obtained is stored in an accumulator 121 at the output of the synapse connecting the input neuron and the output neuron.
[0043] In step 184, the accumulated output values of the synapses are written into the accumulators 140 of the output neurons. Fig. 19 shows an exemplary hardware embodiment of the accumulator 14 in the kernel-side embodiment in write-add mode.
[0044] Each output 20 of a coefficient of the convolution kernel can be realized physically by a memristive device 10, an adder 21, a storage memory 22, and a switch 23. The switch 23 is configured to switch between read-add (read-add) and write-back modes.
[0045] In read-add mode, each output corresponding to a coefficient of the convolution kernel can be made physically by one or more memproting devices 10 storing the synaptic weight, the adder 21 then adding the value stored in the accumulator of each neuron. output 140 with the weight stored in the device (s) memristive (s). The storage memory 22 at the output of the synapse then stores the value thus obtained. The storage memory can be made in analog or digital. FIG. 20 represents the hardware embodiment of the accumulation part of FIG. 19 in write back mode. Once the accumulation is performed for the group of output neurons connected by the dynamic mapping, in the receiver field of the input neuron that was activated (ie triggered), the accumulated values in the accumulators 22 are propagated to the Corresponding output neurons in the accumulators 140. The matrix output 14 includes a storage memory 24 for storing the propagated value from the accumulators 22. FIG. 21 is a flowchart showing the accumulation step (step 103 of FIG. ), according to an embodiment of the accumulation on the output side. In step 210, an input neuron is triggered (eg pixel).
[0046] In step 212, the dynamic interconnection step is performed, for example according to one of the embodiments of FIGS. 12 to 17. In step 214, the value stored in the accumulator of each neuron 140, located in the receiving field of the triggered neuron, is added with the synaptic weight corresponding to the input neuron (stored in at least one memristive device and previously propagated to the output matrix 14), on the side of the output matrix. The value thus obtained is directly stored in the output matrix 14. Fig. 22 shows an exemplary hardware embodiment of the battery 14 in the output-side embodiment. Each output of a coefficient of the convolution core is directly connected to the memristive device (s) storing the value of the coefficient. Each output of the matrix 14 may comprise an adder 25 and a storage memory 23.
[0047] Each output 20 of a coefficient of the convolution core can be made physically by the memristive device 10 storing the synaptic weight and propagated to the matrix 14. The adder 23 then directly adds the value stored in the accumulator of each neuron of output 140 with the weight received. The storage memory 26 on the output matrix side stores the zo value thus obtained. Those skilled in the art will understand that the convolution method according to the various embodiments can be implemented in various ways by hardware, software, or a combination of hardware and software. The invention is not limited to the embodiments described above by way of non-limiting example. It encompasses all embodiments that may be considered by those skilled in the art. In particular, the dynamic interconnection also applies to "pooling" pooling layers to match the inputs with the output neurons involved in a "pooling" calculation.
[0048] Furthermore, the invention is not limited to a particular type of memristive device. For example, the memristive devices can be CBRAM ("conductive bridging RAM"), OXRAM ("Oxide Based Resistive Memory"), PCRAM ("Phase Change RAM") or PCM ("Phase-Change Memory").
[0049] Furthermore, the invention is not limited to the applications described above and applies in particular to any type of audio, image, video or biological data classification applications.

权利要求:
Claims (13)
[0001]
REVENDICATIONS1. A convolutional neural network (100) comprising a plurality of artificial neurons arranged in one or more layers, each convolutional layer comprising one or more output matrices (14) comprising a set of output neurons, each output matrix being connected to a plurality of neurons. an input matrix, comprising a set of input neurons, by artificial synapses associated with a convolution matrix comprising the synaptic weight coefficients corresponding to the output neurons of said output matrix, the output value of each output neuron being determined from input neurons of said input matrix to which the output neuron is connected and synaptic weight coefficients from the convolution matrix associated with said output matrix, characterized in that each synapse is comprised of a set of memristive device comprising at least one memristive device, each set of di memristif spositive storing a coefficient of said convolution matrix, and in that, in response to a change of state of an input neuron of an input matrix, the neural network is able to: dynamically interconnect each set of memory devices storing the weight matrix coefficients to the output neurons connected to said input neuron, and - for each output neuron, accumulating the weight coefficient values stored in said sets of dynamically interconnected memestive devices to said neuron output to an output accumulator (140), which provides the output value of said output neuron.
[0002]
2. neural network according to claim 1, characterized in that the neurons use a temporal coding, and in that the dynamic interconnection is implemented in response to the triggering of an input neuron of the input matrix. (11).
[0003]
Neural network according to claim 2, characterized in that, in response to the dynamic interconnection, the accumulation of the values of the weight coefficients is implemented by propagation of at least one pulse encoding the value of each coefficient of weight, according to said time coding, said accumulated values in the output accumulators constituting the output values of said output matrix (14). 5
[0004]
4. neural network according to claim 1, characterized in that it comprises a set of switches and a logic circuit for mapping the synaptic weight coefficients with the output neurons connected to the input neuron having undergone a change of state, from the address of said input neuron in the input matrix, for performing said dynamic interconnection.
[0005]
5. neural network according to one of claims 3 and 4, characterized in that each pulse comprises a bit stream encoding the address (X, Y) destination of the pulse along two perpendicular axes X and Y, the reference (X, Y) corresponding to the mark of the input matrix, and in that when said pulse arrives at the input matrix, said coded address (X, Y) represents the location of the input neuron at activate.
[0006]
6. Neural network according to one of the preceding claims, characterized in that the dynamic interconnection is carried out in parallel and in a single cycle, simultaneously connecting all the weight coefficients stored in said memristive devices (10). to the output neurons connected to the input neuron having undergone a state change. 25
[0007]
7. neural network according to one of claims 1 to 6, characterized in that the dynamic interconnection is performed semi-sequentially, by connecting the weight matrix coefficients stored in said devices memristifs (10), a line of the matrix after the other, to the output neurons connected to the input neuron having undergone a change of state. 30
[0008]
8. neural network according to one of claims 1 to 6, characterized in that the dynamic interconnection is performed semi-sequentially, by connecting the weight matrix coefficients stored in said memristive devices 3025344 38 (10), a matrix column after another to the output neurons connected to the input neuron having undergone a state change.
[0009]
9. neural network according to one of claims 1 to 6, characterized in that the dynamic interconnection is performed sequentially, by connecting the coefficients of the convolution matrix (5), stored in said memristive devices (10), one after the other, to the output neurons connected to the input neuron having undergone a change of state.
[0010]
10. Neural network according to one of the preceding claims, characterized in that it comprises an accumulator (21, 22) arranged at the output of each synapse (5), said accumulator realizing the accumulation of the value of the coefficient. weight stored in the memristive devices of said synapse with the value stored in the accumulator (140, 24) of the corresponding output neuron, the value stored in said auxiliary accumulator being then propagated by said pulses in said accumulator (140, 24) of the output matrix.
[0011]
11. neural network according to one of claims 1 to 9, characterized in that it comprises an accumulator (23, 25) arranged at each output neuron, the output of each memristive device (10) of the synapse being propagated at the output neuron, the value thus propagated being accumulated with the stored value in the accumulator (140, 23).
[0012]
12. neural network according to one of the preceding claims, characterized in that the outputs of neurons are grouped, and in that the accumulated values corresponding to grouped outputs are stored in a common accumulator (142). 25
[0013]
13. Neural network according to one of the preceding claims, characterized in that it comprises a STDP type of online learning from the dynamic interconnection.

类似技术:

公开号 | 公开日 | 专利标题

FR3025344A1|2016-03-04|NETWORK OF CONVOLUTIONAL NEURONS

US9646243B1|2017-05-09|Convolutional neural networks using resistive processing unit array

Kataeva et al.2015|Efficient training algorithms for neural networks based on memristive crossbar circuits

EP3449423B1|2021-07-21|Device and method for calculating convolution in a convolutional neural network

KR20160034814A|2016-03-30|Client device with neural network and system including the same

EP3449424A1|2019-03-06|Device and method for distributing convolutional data of a convolutional neural network

US9798972B2|2017-10-24|Feature extraction using a neurosynaptic system for object classification

EP3143560A1|2017-03-22|Update of classifier over common features

WO2015175155A1|2015-11-19|Distributed model learning

Srinivasan et al.2019|Restocnet: Residual stochastic binary convolutional spiking neural network for memory-efficient neuromorphic computing

Snider2010|Instar and outstar learning with memristive nanodevices

Panda et al.2017|Asp: Learning to forget with adaptive synaptic plasticity in spiking neural networks

WO2014135801A1|2014-09-12|Unit having an artificial neuron and a memristor

Maan et al.2014|Memristive threshold logic circuit design of fast moving object detection

Vianello et al.2017|Resistive memories for spike-based neuromorphic circuits

US11087204B2|2021-08-10|Resistive processing unit with multiple weight readers

Zhang et al.2019|Memristive quantized neural networks: A novel approach to accelerate deep learning on-chip

Sun et al.2018|Low-consumption neuromorphic memristor architecture based on convolutional neural networks

EP3659072A1|2020-06-03|Computer for spiking neural network with maximum aggregation

US20190311267A1|2019-10-10|Noise injection training for memory-based learning

Bürger et al.2014|Volatile memristive devices as short-term memory in a neuromorphic learning architecture

Huang et al.2021|Memristor based binary convolutional neural network architecture with configurable neurons

Dasgupta et al.2016|Regularized dynamic Boltzmann machine with delay pruning for unsupervised learning of temporal sequences

She et al.2021|ScieNet: Deep learning with spike-assisted contextual information extraction

Ernoult2020|Rethinking biologically inspired learning algorithmstowards better credit assignment for on-chip learning

同族专利:

公开号 | 公开日

EP3186752A1|2017-07-05|

WO2016030230A1|2016-03-03|

FR3025344B1|2017-11-24|

US20170200078A1|2017-07-13|

US11055608B2|2021-07-06|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

WO2017186830A1|2016-04-27|2017-11-02|Commissariat A L'energie Atomique Et Aux Energies Alternatives|Device and method for distributing convolutional data of a convolutional neural network|US9269043B2|2002-03-12|2016-02-23|Knowm Tech, Llc|Memristive neural processor utilizing anti-hebbian and hebbian technology|

US6999953B2|2002-07-03|2006-02-14|Energy Conversion Devices, Inc.|Analog neurons and neurosynaptic networks|

US9269042B2|2010-09-30|2016-02-23|International Business Machines Corporation|Producing spike-timing dependent plasticity in a neuromorphic network utilizing phase change synaptic devices|

WO2013029008A1|2011-08-25|2013-02-28|Cornell University|Retinal encoder for machine vision|

US10140573B2|2014-03-03|2018-11-27|Qualcomm Incorporated|Neural network adaptation to current computational resources|

US9346167B2|2014-04-29|2016-05-24|Brain Corporation|Trainable convolutional network apparatus and methods for operating a robotic vehicle|

US20150339589A1|2014-05-21|2015-11-26|Brain Corporation|Apparatus and methods for training robots utilizing gaze-based saliency maps|DE102015000120A1|2015-01-07|2016-07-07|Merck Patent Gmbh|Electronic component|

US10387770B2|2015-06-10|2019-08-20|Samsung Electronics Co., Ltd.|Spiking neural network with reduced memory access and reduced in-network bandwidth consumption|

US10540768B2|2015-09-30|2020-01-21|Samsung Electronics Co., Ltd.|Apparatus and method to segment object from image|

GB201607713D0|2016-05-03|2016-06-15|Imagination Tech Ltd|Convolutional neural network|

US9646243B1|2016-09-12|2017-05-09|International Business Machines Corporation|Convolutional neural networks using resistive processing unit array|

US9715656B1|2016-09-12|2017-07-25|International Business Machines Corporation|Killing asymmetric resistive processing units for neural network training|

US9940534B1|2016-10-10|2018-04-10|Gyrfalcon Technology, Inc.|Digital integrated circuit for extracting features out of an input image based on cellular neural networks|

US10339445B2|2016-10-10|2019-07-02|Gyrfalcon Technology Inc.|Implementation of ResNet in a CNN based digital integrated circuit|

US10366302B2|2016-10-10|2019-07-30|Gyrfalcon Technology Inc.|Hierarchical category classification scheme using multiple sets of fully-connected networks with a CNN based integrated circuit as feature extractor|

US10360470B2|2016-10-10|2019-07-23|Gyrfalcon Technology Inc.|Implementation of MobileNet in a CNN based digital integrated circuit|

US10402628B2|2016-10-10|2019-09-03|Gyrfalcon Technology Inc.|Image classification systems based on CNN based IC and light-weight classifier|

US10043095B2|2016-10-10|2018-08-07|Gyrfalcon Technology, Inc.|Data structure for CNN based digital integrated circuit for extracting features out of an input image|

US10733505B2|2016-11-10|2020-08-04|Google Llc|Performing kernel striding in hardware|

US10360494B2|2016-11-30|2019-07-23|Altumview Systems Inc.|Convolutional neural networksystem based on resolution-limited small-scale CNN modules|

US10248906B2|2016-12-28|2019-04-02|Intel Corporation|Neuromorphic circuits for storing and generating connectivity information|

CN106779060B|2017-02-09|2019-03-08|武汉魅瞳科技有限公司|A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design|

JP6926218B2|2017-02-24|2021-08-25|ディープマインドテクノロジーズリミテッド|Action selection for reinforcement learning using neural networks|

US11164071B2|2017-04-18|2021-11-02|Samsung Electronics Co., Ltd.|Method and apparatus for reducing computational complexity of convolutional neural networks|

KR102301232B1|2017-05-31|2021-09-10|삼성전자주식회사|Method and apparatus for processing multiple-channel feature map images|

KR20180136202A|2017-06-14|2018-12-24|에스케이하이닉스 주식회사|Convolution Neural Network and a Neural Network System Having the Same|

US10366328B2|2017-09-19|2019-07-30|Gyrfalcon Technology Inc.|Approximating fully-connected layers with multiple arrays of 3x3 convolutional filter kernels in a CNN based integrated circuit|

US11216723B2|2017-08-11|2022-01-04|Syntiant|Pulse-width modulated multiplier|

US10699160B2|2017-08-23|2020-06-30|Samsung Electronics Co., Ltd.|Neural network method and apparatus|

CN107817898A|2017-10-31|2018-03-20|努比亚技术有限公司|Operator scheme recognition methods, terminal and storage medium|

EP3729340A4|2017-12-18|2021-12-29|Mythic, Inc.|Systems and methods for mapping matrix calculations to a matrix multiply accelerator|

WO2019210300A1|2018-04-27|2019-10-31|Carnegie Mellon University|Polynomial convolutional neural network with late fan-out|

US10417342B1|2018-07-03|2019-09-17|Gyrfalcon Technology Inc.|Deep learning device for local processing classical chinese poetry and verse|

US10311149B1|2018-08-08|2019-06-04|Gyrfalcon Technology Inc.|Natural language translation device|

CN112703511A|2018-09-27|2021-04-23|华为技术有限公司|Operation accelerator and data processing method|

CN111010492A|2018-10-08|2020-04-14|瑞昱半导体股份有限公司|Image processing circuit and related image processing method|

CN111048135A|2018-10-14|2020-04-21|天津大学青岛海洋技术研究院|CNN processing device based on memristor memory calculation and working method thereof|

CN109448068B|2018-10-16|2020-07-31|西南大学|Image reconstruction system based on memristor cross array|

US10387772B1|2018-10-22|2019-08-20|Gyrfalcon Technology Inc.|Ensemble learning based image classification systems|

WO2020117348A2|2018-12-06|2020-06-11|Western Digital Technologies, Inc.|Non-volatile memory die with deep learning neural network|

US11133059B2|2018-12-06|2021-09-28|Western Digital Technologies, Inc.|Non-volatile memory die with deep learning neural network|

WO2020207982A1|2019-04-09|2020-10-15|Aictx Ag|Event-driven spiking convolutional neural network|

CN111681696A|2020-05-28|2020-09-18|中国科学院微电子研究所|Nonvolatile memory based storage and data processing method, device and equipment|

法律状态:
2015-08-31| PLFP| Fee payment|Year of fee payment: 2 |

2016-03-04| PLSC| Publication of the preliminary search report|Effective date: 20160304 |

2016-08-31| PLFP| Fee payment|Year of fee payment: 3 |

2017-08-31| PLFP| Fee payment|Year of fee payment: 4 |

2018-08-30| PLFP| Fee payment|Year of fee payment: 5 |

2019-08-30| PLFP| Fee payment|Year of fee payment: 6 |

2020-08-31| PLFP| Fee payment|Year of fee payment: 7 |

2021-08-31| PLFP| Fee payment|Year of fee payment: 8 |

优先权:

申请号 | 申请日 | 专利标题

FR1458088A|FR3025344B1|2014-08-28|2014-08-28|NETWORK OF CONVOLUTIONAL NEURONS|FR1458088A| FR3025344B1|2014-08-28|2014-08-28|NETWORK OF CONVOLUTIONAL NEURONS|

US15/505,231| US11055608B2|2014-08-28|2015-08-18|Convolutional neural network|

EP15756132.5A| EP3186752A1|2014-08-28|2015-08-18|Convolutional neural network|

PCT/EP2015/068955| WO2016030230A1|2014-08-28|2015-08-18|Convolutional neural network|

[返回顶部]